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Abstract 

The Reverse Water Gas Shift system (RWGS) is a 
complex, physical system designed to produce oxy- 
gen from the carbon dioxide atmosphere on Mars. If 
sent to Mars, it would operate without human super- 
vision, thus requiring a reliable automated system for 
monitoring and control. The RWGS presents many 
challenges typical of real-world systems, including: 
noisy and biased sensors, nonlinear behavior, effects 
that are manifested over different time granularities, 
and unobservability of many important quantities. In 
this paper we model the RWGS using a hybrid (dis- 
crete/c ontinuous) Dynamic Bayesian Network (DBN), 
where the state at each time slice contains 33 discrete 
and 184 continuous variables. We show how the sys- 
tem state can be tracked using probabilistic inference 
over the model We discuss how to deal with the var- 
ious challenges presented by die RWGS, providing a 
suite of techniques that are likely to be useful in a 
wide range of applications. In particular, we describe 
a general framework for dealing with nonlinear behav- 
ior using numerical integration techniques, extending 
the successful Unscented Filter. We also show how 
to use a fixed-point computation to deal with effects 
that develop at different time scales, specifically rapid 
changes occurring during slowly changing processes. 

We test our model using real data collected from the 
RWGS, demonstrating the feasibility of hybrid DBNs 
for monitoring complex real-world physical systems. 

1 Introduction 

The Reverse Water Gas Shift System (RWGS) shown in 
Fig. 1 is a complex physical system designed and con- 
structed at NASA’s Kennedy Space Center to produce oxy- 
gen from carbon dioxide. NASA foresees a number of pos- 
sible uses for the RWGS, including producing oxygen from 
the atmosphere on Mars and converting carbon dioxide to 
oxygen within closed human living quarters. 

In a manned Mars mission, the RWGS would operate 
for 500 or more days without human intervention [Larson 
and Goodrich, 2000]. This level of autonomy requires the 
development of robust and adaptive software for fault diag- 
nosis and control. In this paper, we focus on two key sub- 
tasks — monitoring and prediction. Monitoring, or track- 
ing the current state of the system, is a crucial component 



Figure 1: The Prototype RWGS System 


of the control system. Prediction of the system’s expected 
behavior is a basic tool in fault diagnosis — discrepancies 
between the predicted and the actual behavior of the system 
may indicate the presence of faults. 

The RWGS presents a number of significant modeling 
and algorithmic challenges. From a modeling perspec- 
tive, the system is very complex, and contains many sub- 
tle phenomena that are difficult to model accurately. Var- 
ious phenomena in the system manifest themselves over 
dramatically different time scales, ranging from pressure 
waves that propagate on a time scale of milliseconds to 
slow changes such as gas composition that take hours to 
evolve. From a tracking perspective, the system dynamics 
are complex and highly nonlinear. Furthermore, the sen- 
sors give only a limited view of the system state. Some key 
quantities of the system are not measured, and the available 
sensors are noisy and biased, with both the noise level and 
the bias varying with the system state. 

In this paper we model the RWGS using a hybrid (dis- 
crete/continuous) Dynamic Bayesian Network (DBN), and 
show how the system state can be tracked using probabilis- 
tic inference over the model. We focus on the continuous 
part of the model, assuming all the discrete variables are 
known. We discuss how to deal with the various challenges 
presented by the RWGS, both in terms of modeling and in 
terms of inference. We provide a suite of techniques that 
are likely to be useful in a wide range of applications, in- 


eluding the ease where the discrete variables are not ob- 
served. 

Perhaps the most interesting modeling problem pre- 
sented by the RWGS is the issue of different time granu- 
larities. A naive solution is to discretize time at the finest 
granularity. Unfortunately, this approach is generally in- 
feasible both because of the computational burden and be- 
cause the number of observations is effectively reduced to 
one for every few thousand time steps, leading to serious 
inaccuracies. Instead, we take the approach of modeling 
the system at the time granularity of the observations. We 
show how to deal with the almost instantaneous changes 
relative to our time discretization by modeling a part of our 
system as a set of fixed-point equations. 

For the inference task, we provide some new insights 
into the problem of tracking nonlinear systems. This task 
is commonly performed using the Extended Kalman Fil- 
ter (EKF) [Bar-Shalom et al L, 2001] or the simpler and 
more accurate Unscented Filter (UF) [Julier and Uhhnann, 
1997]. We view the problem as a numerical integration 
problem and demonstrate that the UF is an instance of a 
numerical integration technique. More importantly, our ap- 
proach naturally leads to important generalizations of the 
UF: We show how to take advantage of the structure of the 
DBN and present a spectrum of filters, trading off accuracy 
with computational effort. 

We tested our model using real data collected from the 
RWGS prototype system. Our results demonstrate the po- 
tential of using hybrid DBNs as a monitoring tool for com- 
plex real-world physical systems. 

2 Preliminaries 

In this paper, we characterize physical systems as discrete- 
time stochastic processes. System behavior is described in 
terms of a system state which evolves stochastically at dis- 
crete time steps / = 0-1.2 We assume that the system 

is Markovian and stationary , i.e., the state of the system 
at time / -f 1 only depends on its state at time /, and the 
probabilistic dependencies are the same for all / . 

The system state is modeled by a set of random vari- 
ables .V - (.Vi X v ] . We partition the state vari- 

ables .V into a set of evidence (observed) variables, E , and 
a set of hidden (unobserved) variables, H. Physical sys- 
tems commonly comprise both continuous quantities (e.g., 
flows, pressures, gas compositions) and discrete quanti- 
ties (e.g., valve open/closed, compressor on/off). Conse- 
quently, we model such systems as hybrid systems, with X 
comprising both discrete and continuous variables. 

We model the process dynamics of our system using a 
Dynamic Bayesian Network (DBN) [Dean and Kanazawa, 
1989]. A DBN is represented as a Bayes Net fragment 
called a 2TBN, which defines the transition model P{ X* | 
X) where X f — f.Yi — , V fr , ] denotes the variables at 
time / -f 1 and X = (A'i , . . . , .V,/] denotes some subset of 
the variables at time / which are persistent ; in that their val- 
ues directly influence the next state. More formally, a DBN 


is a directed acyclic graph, whose nodes are random vari- 
ables in two consecutive time slices, X and X' . The edges 
in the graph denote direct probabilistic influence between 
the parents and their child. For every variable V' at time 
/ -f 1 we denote its parents as Par( .Y') C X U X'. Each 
.V' is also annotated with a Conditional Probability Dis- 
tribution (CPD), that defines the local probability model 
P(X l | Par( V' )). In our hybrid model, discrete nodes do 
not have continuous nodes as parents. 

The tracking problem in DBNs is to find the belief state 

distribution Bel(X') P{X f \ r* 1 , r'), where X 1 typ- 
ically consists of the persistent variables X at time /, and 
r 1 . c 1 are the evidence variables from time 1 to time I . 
The belief state summarizes our beliefs about the state of 
the system at time /, given the observations from time 1 
to time / . As such, it makes current and future predictions 
independent of past data. The tracking algorithm is an it- 
erative process that propagates the belief state. We start 
with the belief state at time /, Bel(X') and perform three 
steps. We first compute P(X\ X' +1 | r 1 , ...,r / ) as the 
product Bel(X i )/ , (X /+1 | X'). Next we marginalize out 
X' resulting in a distribution over X /+1 . Finally, we con- 
dition on and the result is the belief state at / -f 1, 
Bel(X /+1 ). 

Linear models are an important class of DBNs. In a 
linear model, all the variables in X are continuous and 
all the dependencies are linear with some added Gaussian 
noise. More precisely, if a node X has parents Yi , .... Y* 

then P{ X | Yi H) = ]T* =1 + V, where the ?/ r ’s 

are constants and V has a normal distribution A'i/t.rr 2 ). 
In a dynamic linear model, tracking can be done using a 
Kalman filter [Kalman, I960], where the belief state is rep- 
resented parametrically as a multivariate Gaussian in terms 
of the mean vector and the covariance matrix. Kalman fil- 
ters therefore allow a compact belief state representation, 
which can be propagated in polynomial time and space. 

When the dependencies in the model are nonlinear, the 
resulting distributions are generally non-Gaussian and can- 
not be represented in closed form. Consequently, the belief 
state is generally approximated as a multivariate Gaussian 
that preserves the first two moments of the true distribu- 
tion. The traditional method for doing this approximation 
is using an Extended Kalman Filter (EKF) [Bar-Shalom et 
al , 2001]. Assume that X* — /(X), where / is some 
nonlinear function and X ^ ,V(//. Y). Note that we can 
always assume that f is deterministic: If the dependency 
between X and X 1 is stochastic we can treat the stochas- 
ticity as extra random variables that f takes as arguments. 
The EKF finds a linear approximation to / around the mean 
of X, i.e., we approximate / using the first-order Taylor 
series expansion around //. The result is the linear func- 
tion /(X) » /(/#) + Yf\ ft (X - /#), where Yf\ ft is the 
gradient of / evaluated at p} 


l A second-order EKF approximation exists, but its increased 
complexity tends to limit its use. 



The EKF has two serious disadvantages. The first is its 
inaccuracy — the EKF is accurate only if the second and 
higher-order terms in the Taylor series expansion are neg- 
ligible. In many practical situations, this is not the case 
and using the EKF leads to a poor approximation. The 
second disadvantage is the need to compute the gradient. 
Some nonlinear functions may not be differentiable (e.g., 
the max function), preventing the use of an EKF. Even 
if the function is differentiable, computing the derivatives 
may be hard if the function is represented as a black box 
rather than in some analytical form. 

The Unscented Filter (UF) [Julier and Uhhnann, 1997] 
provides an alternative approach to tracking nonlinear be- 
havior. As with the EKF, the UF assumes that X r = /( X ) 
and X — A'(/#, £). The UF works by deterministically 
choosing 2d ■+ 1 points *o, * 2./* where arn = /# and the 
other points are symmetric around // (die actual points de- 
pend on ^). Associated with each point is a weight tiy. 
The UF computes x J =■ /( 37 ) for i = 0, 1 , . . . , 2 d 9 result- 
ing in 2d + 1 points in 1R T7? , from which it estimates the first 
two moments of X 1 as a weighted average of the In 

particular, the mean F[X'] is approximated as Y/ito «’«*»{ • 

The UF has several significant advantages over the EKF. 
First, it is easier to implement and use than the EKF — no 
derivatives need be computed, and the function / is simply 
applied to 2d + 1 points. Second, despite its simplicity, the 
UF is more accurate than the EKF: The UF is a third-order 
approximation, i.e., inaccuracies are induced only by terms 
of degree four or more in the Taylor series expansion. Fi- 
nally, instead of just ignoring the higher-order terms, the 
UF can account for some of their effects, by tuning a pa- 
rameter used in the point selection. As shown in [Julier and 
Uhlmann, 1997], the UF can be extremely accurate, even in 
cases where the EKF leads to a poor approximation. 

3 The RWGS System 

The purpose of the RWGS is to decompose carbon diox- 
ide (C0 2 ) (abundant on Mars) into oxygen (O2) and carbon 
monoxide (CO). The system, shown in Fig. 2(a) [Goodrich, 
2002], comprises two loops: a gas loop that converts CO? 
and hydrogen (H 2 ) into H 2 0 and CO, and a water loop that 
electrolyzes the H 2 0 to produce 0 2 and H 2 . Under normal 
operation, C0 2 at line (1) is combined with H 2 returned 
from the electrolyzer via line (12), and a mixture of C0 2 , 
H 2 , and CO from the reactor recycle line (11). This mixture 
enters a catalyzed reactor (3) heated to I00°C. Approxi- 
mately 10% of the COo and H*> react to form CO and H 2 0: 
C0 7 + IF =±CO - h IUO 

The H 2 0 is condensed at (4) and is stored in a tank (5). The 
remaining gas mixture passes through a separation mem- 
brane (9), which sends a fraction of the CO to the vent (13) 
while directing the remaining mixture into the recycle line 
( 1 1). A compressor ( 10) is used to maintain the necessary 
pressure differential across the membrane. In the water 
loop, the H 2 0 in tank (5) has some C0 2 dissolved in it, 
which would be detrimental to the electrolyzation process. 


To remedy this, the H?0 is pumped into a second tank (6), 
and has Ho bubbled through it to purge the COo. From 
there, the H 2 0 is pumped into the electrolyzer (8), which 
separates a portion of it into 0 2 and H 2 . The H 2 re-enters 
the gas loop via ( 12), while the remaining H 2 0, along with 
the 0 2 , goes into tank (7), where the mixture is cooled and 
separated. The HoO returns to the electrolyzer, while the 
0 2 leaves the system through ( 14). 

In addition to its normal operating mode, the system 
may operate without the electrolyzer and water pumps. In 
this mode, the Ho for the reaction is supplied by a supply 
line (15) paralleling the C0 2 supply line. This option is not 
feasible for operation on Mars, but has proven useful for 
testing the physical system while under development. 

The RWGS is an interconnected nonlinear system 
where the various components influence each other in com- 
plicated and sometimes unexpected ways. For example, 
during runs without the electrolyzer, it is necessary to 
empty the water tank (5) periodically, to prevent water from 
accumulating and eventually overflowing the tank. This 
causes the gases in the tank to expand, and thus creates a 
significant and sudden pressure drop, which affects the flow 
throughout the whole system. This phenomenon is demon- 
strated in Fig. 2(b), taken from [Whitlow, 2001]. The graph 
shows the flow through the CO vent ( 13) as it evolves over 
time — the spikes correspond to emptying the water tank. 

A challenging property of the RWGS is that phenomena 
in the system manifest themselves over at least three differ- 
ent time scales. Pressure waves in the RWGS propagate 
essentially instantaneously (at the speed of sound). Gases 
flow around the gas loop on the order of seconds. Finally, 
gas compositions in the gas loop take on the order of hours 
to reach a steady state. Meanwhile, the sensors collect data 
at a sampling rate of one second. 

An additional challenge of the RWGS is its sensitivity 
and unidentifiability, i.e., parts of the system state are very 
sensitive to input paramaters and are not directly measured. 
For example, the H 2 and C0 2 compositions in the gas loop 
cannot be practically measured. However, the balance be- 
tween these compositions is almost neutrally stable, thus 
a small shift in the input conditions or the membrane be- 
havior will cause the balance to gradually drift to a signifi- 
cantly different value. 

As in any real system, the RWGS sensors do not record 
the underlying state exactly. In addition to some impor- 
tant quantities, such as the gas compositions, which are not 
measured at all, the existing sensors are noisy and biased. 
The noise level of the sensors depends on many factors and 
can change over time. An example is shown in Fig. 2(c), 
where the difference in the readings of the pressure sensors 
P3 and Pa (both located at (2) in Fig. 2(a)) is plotted over 
time. The main reason for the noise in time steps 0-800 
is the physical proximity of the sensors to the compressor 
that sends pressure waves throughout the system. Since 
the sensors are not synchronized with the compressor, they 
take measurements at various phases of the pressure waves 
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Figure 2: (a) The RWGS Schematic, (b) Effects of emptying a water tank, (c) Pressure difference between P* and P 4 . 


and thus measure significantly different values. After 796 
seconds the compressor shuts down and the noise level de- 
creases dramatically. 2 More interestingly, we note that 
the two sensors are placed very close together and thus the 
average difference should be zero. However, as the plot 
demonstrates, this is not the case, indicating that the sen- 
sors are not well calibrated and some bias is present. Fur- 
thermore this bias depends on the system state, as shown by 
the change in the average difference when the compressor 
shuts off. 

4 Modeling the RWGS 

We model the RWGS using a hybrid DBN, as described in 
Section 2. The 2TBN has 293 nodes, 227 of which are con- 
tinuous. Currently the discrete variables in the model are 
all known and correspond to computer-controlled switches 
and sensor faults. The continuous variables in our model 
capture the continuous-valued elements of our system (e.g., 
pressure at various points in the system, flow rates, tem- 
peratures, gas composition, etc.). Of the 227 continuous 
nodes, 43 represent the time / belief state X and 1 84 repre- 
sent the variables X' at time i + 1 . Of the latter, 43 variables 
are belief state variables for / -f 1 , 72 variables are encap- 
sulated variables, as discussed in Section 5.4, and the rest 
are either sensor variables or transient variables. 

When constructing the model, we used four techniques 
for parameter estimation. Some of the parameters were 
known physical constants or system properties. Of the em- 

2 The sensor’s noise is literally noise that can be heard — the 
pressure waves are the sound waves generated by the compressor. 


pin cal parameters, many came from physical models. The 
others (specifically, some parameters for the compressor, 
the separation membrane and the overall system pressure 
changes) were determined using common equations that 
model the particular system behavior. All the variables in 
these equations were directly observed in the data, and thus 
we could use least-squares techniques to find the best fit for 
the parameters. The remaining parameters were estimated 
using prior knowledge of the domain. 

4.1 Sensor Modeling 

As discussed in Section 3, one of the challenges we address 
in modeling the RWGS is dealing with noisy and biased 
sensors. We deal with noisy sensors in the obvious way: 
by increasing the variance of the predicted measurement 
values to match the noise level in the data. 

Sensor biases present a more interesting modeling prob- 
lem. The biases are not easily modeled using a simple pa- 
rameter since they are unknown and can drift over time. In- 
stead, we address the problem by adding hidden variables 
to the belief state that model the different biases of the sen- 
sors. Biases start with zero mean and a reasonably large 
variance and persist over time, i.e., Bias' =r Bias' + V, 
where \ ' represents white noise with a relatively small vari- 
ance, allowing for some amount of drift to occur over time. 

This idea works quite well, but it tends to overfit the 
data: By letting the bias account for every discrepancy be- 
tween the model predictions and the actual sensor measure- 
ments, the tracking algorithm might settle in an incorrect 
steady state. To fix the problem we must make sure that 
the model biases reflect true sensor biases — biases should 



be kept as small as possible and allowed to grow only if 
there is a real reason for that. We implement this idea by 
introducing a contraction factor -> < 1 (empirically set to 
be 0.07) into the bias formula: Bias'* 1 = -) * Bias' *f V\ 
Thus, biases tend to go to zero unless doing so introduces 
a systematic discrepancy with the predicted system state. 

4.2 Sensitivity and Unidentifiability 

Recall that the equations governing the H 2 /CO 2 balance in 
the gas loop are sensitive to slight variations in the physical 
parameters. Thus even using the most exact form of these 
equations in the model will result in (at least) the same level 
of sensitivity — both to variations in the physical parame- 
ters, and inherent errors in the parameters. Moreover, the 
model value is also sensitive to model effects such as cal- 
culation errors and sensor errors that do not affect the real 
value. We therefore use equations for the IWCO? balance 
that contain an intentionally non-physical component — a 
stabilizing teim — that reduces the sensitivity. This term 
drives the balance to a pre-detennined point, which in this 
case is our expected value for the balance. The magnitude 
of this term is manually adjusted to provide an optimum 
tradeoff between physical accuracy and model stability. 

43 Differing Time Scales 

As described in Section 3, we must deal with differing time 
scales in modeling the RWGS. The naive solution to this 
problem is to model the DBN at a very fine time granularity. 
However, it is completely impractical to model the behav- 
ior of the pressure waves using a discretized-time model. 
To do so would require time steps three orders of magni- 
tude smaller than the time between measurements, which 
is a significant waste of resources. Furthermore, it would 
require a much more complete description of the system 
than is practical, and tracking the slowly-evolving aspects 
of the system with a step size many orders of magnitude be- 
low their time scale would allow substantial errors to build 
up. 

Thus, we approximate the pressure waves as occurring 
instantaneously and instead of modeling their transient be- 
havior, we model the quasi-steady-state results at each time 
step after they have reached an equilibrium. The equa- 
tions in this case are substantially simpler, and require far 
fewer empirical constants. The difficulty, however, is that 
these equations must be solved simultaneously; a change 
in any part of the system will affect all of the other parts. 
These equations include both the compressor equation and 
an approximation to the membrane equations developed 
in [Whitlow, 2001]; thus, they are fairly large and nonlin- 
ear, and no direct simultaneous solution form exists. In- 
stead, we use these equations to create a new equation that 
converges to a fixed point solution. 

We must insert this fixed-point equation into a (nonlin- 
ear) CPD to use it in our DBN model of the RWGS. The 
equation solves for the five model variables Z that account 
for the flows and pressure of file gas loop. In order to solve 


for all five variables, their eight parents must also be present 
in the CPD. Hence, we have a vector CPD for Z whose def- 
inition is essentially procedural: given a value of the eight 
parents it executes an iterative fixed-point computation un- 
til convergence, and outputs the values Z. 

5 Tracking In Nonlinear Systems 

In this section, we address the problem of inference, fo- 
cusing on tracking in complex nonlinear systems, such as 
the RWGS. In these models, the probabilistic dependen- 
cies, including sensors, can be either linear or nonlinear 
functions with Gaussian noise. We restrict our attention to 
the task of tracking the continuous state, assuming all the 
discrete values are known. Note that although the results in 
this section are presented in terms of dynamical systems, 
the analysis also applies to probabilistic inference in static 
nonlinear Bayes nets. 

5.1 Exploiting DBN Structure 

Recall the setup from Section 2: We have a Gaussian belief 
state Bel(X) where X £ 1R / and a 2TBN representing 
P(X ' | X) as a deterministic function X* — f(X). Our 
goal is to find an approximation of P{ X f ) as a multivariate 
Gaussian. The classical approach, used in the EKF and 
the UF, is to find the entire distribution P{ X') directly by 
treating / as a function from 1R' 7 to JR ,r \ An alternative 
approach is to decompose / by defining AT — //(TV) for 
7 = 1 , . . . . in f where Y, ~ Par( AT). In most practical 
cases the /, ’s have a lower dimension than /; as we shall 
see, this reduction in the dimension lets us approximate the 
resulting distribution more accurately and efficiently. 

As discussed in Section 2, the first step in the be- 
lief state propagation process is to compute a multivari- 
ate Gaussian over (X. X*}. We begin with our Gaus- 
sian Bel(X), and add the variables from X' one at a 
time, using the procedure described in Section 5.2. The 
key insight is that, as X- is conditionally independent of 
fX — Yi, -V] AT., ] given Y ti it suffices to approx- 

imate the Gaussian P(TV AT). We can then compute 

rix. av..,at)= HX,.v; at_,)p(at | y.), 

which, for Gaussians, can be accomplished using simple 
linear algebra operations. 

A more difficult case arises when the DBN contains not 
only inter-temporal edges from X to X f , but also intra- 
temporal edges between X f variables. In this case we 
sort the variables Xj in topological order, and gradually 

build up the joint distribution P(X , .V] AT). The 

topological order ensures that when we need to compute 
P(TV AT), we have already computed a Gaussian over 

Yi C XUf .VJ AT., } . This approach, however, may 

introduce some new inaccuracies, because we now also use 
a Gaussian approximation for the distribution of the rele- 
vant variables from {.Y{ , AT_i ] . 

Even in cases where we introduce extra inaccuracies, 
this method is often superior to the UF. The reason is that, 
by reducing the dimension of the functions involved, we 



can use more accurate techniques to approximate the first 
two moments of the variables in X' with the same compu- 
tational resources. In general, there is a tradeoff between 
the superior precision we achieve for each variable and the 
potential for extra inaccuracies we introduce. The extra in- 
accuracies depend on the quality of our Gaussian approxi- 
mation for P(X. X\ .Y^), and on the extent of the 

nonlinearity of the dependencies within X If the depen- 
dence of Y' on f X \ , . . . , Y ■ _ 1 ] is linear, then there are 
no extra errors introduced: In this case the first two mo- 
ments of X} are only influenced by the first two moments 
of f Y' , . . . , .Y;_! } which can be captured correctly by our 
Gaussian approximation. It is somewhat reassuring that the 
better our approximation of P(X f ) as a Gaussian is, the 
less significant the extra errors we introduce are, as the en- 
tire framework is based on the assumption that P( X 1 ) can 
be well approximated by a Gaussian. 

5.2 Numerical Integration 

We now turn our attention to the task of approximating 
P{Y ,Y f 0 as a multivariate Gaussian. To simplify our 
notation, let X be a variable which is a nonlinear func- 
tion of its parents Y — Vi,... .V?, i.e., X - f{Y\ 
but the ensuing discussion also holds for the vectoT case of 
X - f{Y). We assume that P(Y) is a known multivariate 
Gaussian, and the goal is to find a Gaussian approximation 
for P(Y, .Y). It suffices to compute the first two moments: 


m = 

j P{Y)J(Y)dY 

( 1 ) 

nx*} = 

f P{Y)f-{Y)dY 

( 2 ) 

n[\Yj] = 

j P{Y)f{Y)YjdY 

( 3 ) 


Note that the integrals only involve the direct parents 
of Y, significantly reducing their dimension. We can ef- 
fectively compute these integrals using a version of the 
Gaussian Quadrature method called the Exact Monomial 
rules [Davis and Rabinovitz, 1984]. Generally speaking, 
Gaussian Quadrature approximates integrals using a for- 
mula of the form: 

f v 

I W(x)f(x)dx fis ^ «•;/(*;) 

J - 1 

where IT 7 ( x) is a known function (a Gaussian in our case). 
The points x ; and weights vj are carefully chosen to en- 
sure that this approximation is exact for any polynomial / 
whose degree is at most p. The degree p is called the pre- 
cision of the approximation. 

Finding a set of points with a minimal size V for some 
precision p is not a trivial task. In the simple form of Gaus- 
sian Quadrature, we choose points in one dimension and 
use them to create a grid of points in IR J with the obvious 
disadvantage that V grows exponentially with d. Fortu- 
nately, we can do better. In [McNamee and Stenger, 1967] 
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Figure 3: (a) Density estimates for Y = y/Yp -Mo 2 . (b) 
Random samples from the RWGS network for the flow at 
point (16) and the pressure at point (2), and Gaussian esti- 
mates for the distribution. 

a general method is presented for .V - O (i^-) and pre- 
cision p - 2k + 1 (d is the dimension of the integral, in 
our case \Y\). In particular, rules are presorted for 2d + 1 
points with precision 3, 2d 2 + 1 points with precision 5 
and + §</+! points with precision 7. The precision 3 
rule is exactly the rule used for the Unscented Filter: It has 
exactly the same 2d + 1 points and weights. 

This view of the Unscented Filter has immediate prac- 
tical consequences: we can trade off between the accuracy 
of the computation and its computational requirements. For 
example, if we are interested in a more precise filter than 
the Unscented Filter and are willing to evaluate the func- 
tion at f>(d 2 ) points then we can use the exact monomial 
rule of precision 5. Depending on the function, this may 
represent a significant gain in accuracy. 

As a simple example we consider the nonlinear function 
.V = yVf + V, 2 where P(V,) = A'(2,-l) and P(V 2 | 
Vi) = A'(0.5Vi - 1,3) (note that both V, andV 2 have die 
same variance I). Fig. 3(a) shows various estimates for the 
probability of X. The optimal estimate is the best Gaussian 
approximation for the distribution of A' computed using a 
very exact numerical integration rule. We can see that the 
exact monomial rules of precisions 3 and 5 provide a much 
better estimate than EKF, where the precision 5 rule leads 
to a more accurate estimate than the precision 3 rule. 

53 Inaccuracies In the Approximation 

Unfortunately, approximating P(Y, X) using numerical 
integration can lead to covariance matrices that are not 
semi-positive definite, and hence illegal. One simple ap- 
proach to this problem is to use a more accurate integration 
rule, although the problem may persist. An alternative is to 
find the “closest” positive definite covariance matrix. We 
cast this problem as a convex optimization problem follow- 
ing [Boyd and Vandenberghe, 2003]. 

Consider once again the problem of approximating 
P[Y , Y) as a multivariate Gaussian, where Y is a non- 
linear function of its parents Y, i.e., X = f(Y\ and 
Y ~ A'(/<v» -rr). Let V denote the estimated covari- 



ance matrix for P(Y, .V): 



If n and f lead to a matrix 31 that is not positive definite, 
then we need to find the closest u and r to fi and r, such 
that Y is positive definite. Given that 3yy is already pos- 
itive definite, 31 is positive definite iff t — » T Ey y M > 0. 
Thus, we can formalize our problem as follows: 

Minimize || n - n || 2 +(r - r)~ (4) 

Subject to n r 3yyU — r + c < 0 (5) 

where c is some small positive number. Since both Eq. 4 
and Eq. 5 are convex we can solve this problem by form- 
ing the Lagrangian and solving the dual problem. We set 
the partial derivatives of it and r to zero and plug the result 
into Eq. 5. We get an equation over the Lagrangian multi- 
plier which can be solved easily as it involves a monotonic 
function. We omit details for lack of space. 

Our analysis treats the elements in n and r directly, 
but in fact these elements are not independent since i/,* — 
n[Y,X] - // r , n[.X] and r = r[X~] - 77[.Y] 3 * * * * * . It is desir- 
able to use this relation in Eq. 4 and Eq. 5 and represent the 
dependency between the various elements (e.g., a change in 
/7[.Y] may fix many of the problems simultaneously). Un- 
fortunately, because of the tenn 77[.Y] 2 the problem is no 
longer convex. Nonetheless, we can approximate the prob- 
lem as convex (e.g., by replacing 77 [.Y] 2 by the best current 
estimate), solve it and iterate. Again, we defer details to an 
extended version of this paper. 

5.4 Encapsulated Variables 

Just as we can use the DBN structure to decompose the de- 
pendency between X 9 and X , in many cases we can further 
decompose the dependency Y — /( Y). For example, as- 
sume that f(Y) — g{g\ (Y i ). g?(Y?)), where Yi , Y? C 
Y ? Instead of directly evaluating the Gaussian over 
(Y, .Yj we can define two extra variables: Ti = g\ ( Yy ) 
and T 2 = < 7 ?(Y 2 ). We first approximate P(Yi.Ti) as 
a Gaussian and use it to find a Gaussian over {Y,T|}. 
Next we approximate P[ Y 2 , To) as a Gaussian and from it 
P( Y. Tt , T 2 ). Finally, we approximate P(T i , To, Y) as a 
Gaussian and use it to find the Gaussian approximation for 
P( Y. Ti , To, .Y). The same accuracy tradeoffs that were 
discussed in the context of X 1 — J(X) apply here: by 
reducing the dimension of the integrals we can solve each 
one more accurately, but may introduce further errors if the 
interaction between the extra variables is nonlinear. 

3 E.g., flow sensors give different results depending on the gas 

type. Assuming we have random variables representing the total 

flow and the compositions of the different gases in it, and 

may each be a product of one of the gas compositions and the 

flow, thus representing the net flow of a certain gas. The func- 

tion g would be a weighted sum of these flows where the weights 

correspond to the sensor’s response for the different gases. 



Figure 4: Comparison with particle filtering on simulated 
data, showing the means and error bars of two standard de- 
viations for our algorithm and particle filtering. The Y axis 
represents time, and the 3 ’ axis the percentage of H 2 in the 
flow at point (16). To increase readability, we shift the es- 
timates generated by our algorithm by 0.1 to the left and 
those generated by particle filtering by 0.1 to the right. 

In principle, one could add T f and T 2 to the DBN and 
treat them as regular variables. However, doing so makes 
these variables part of X\ and thereby increases the al- 
gorithm’s space complexity, which is 0{ [X'l 2 ) (for repre- 
senting the covariance matrix of P[ X 1 )). It is better to treat 
the extra variables as local variables encapsulated within 
the CPD and unknown to the rest of the network. After 
computing the Gaussian approximation for the CPD vari- 
ables, we simply marginalize over the encapsulated ones. 
This approach is similar to the local computations in an 
OOBN model [Koller and Pfeffer, 1997], where some of 
the CPD variables are encapsulated within the CPD. 

6 Experimental Results 

In this section we present results from a set of experiments 
that test the efficacy and robustness of our model and track- 
ing algorithm. Our computational model of the RWGS con- 
tains all of the components needed to monitor the full op- 
eration of the physical system, although data provided to 
date by KSC is for the reduced-operation mode with only 
the gas loop component operational. Our experiments were 
run on a Pentium III 700MHz. 

We tested our algorithm with both real data and simu- 
lated data that was generated from our model. Although 
running with real data is the real test for our approach, run- 
ning with simulated data is also of interest. The reason is 




that there are two sources of errors when using real data: 
model inaccuracies and errors induced by the algorithm. 
When using simulated data, only errors of the second type 
are present and we can better test the performance of the 
algorithm. 

6.1 Results on Simulated Data 

We first tested whether the belief state could be well ap- 
proximated as a Gaussian and whether our particular ap- 
proximation was a good one. To do so, we generated a set 
of samples from the model. We did not introduce any evi- 
dence so the samples were indeed sampled from the correct 
joint distribution. In Fig. 3(b) we show the results for two 
particular variables: the flow at point ( 1 6) and the pressure 
at point (2) (these variables were chosen because of their 
dependency on the non-linear CPD of the membrane; other 
variables produced similar results). The samples appear to 
be drawn from a distribution that is either a Gaussian or 
close to one. Furthermore, our estimate for the joint distri- 
bution (depicted by the contours for one and two standard 
deviations) is very close to the Gaussian that was estimated 
directly from the samples. Thus, it is reasonable to expect 
that our techniques will lead to good approximations of the 
belief state. 

Next, we generated a trajectory of 500 time steps from 
our model and tested our algorithm on it. We compared 
our results with the particle filtering algorithm [Gordon et 
al . , 1993], which approximates the belief state as a set of 
weighted samples where the weights of the samples corre- 
spond to the likelihood of die evidence given the sample. 
Our algorithm took 20ms per time step, which included 
computing the Gaussian approximation to the belief state, 
with numerical integration when necessary, and condition- 
ing on the evidence. In comparison, generating a sample 
using particle filtering took 1 .5ms. Thus, one step of our al- 
gorithm took as much time as generating 13 samples. How- 
ever, with just 13 samples particle filtering performed ex- 
tremely poorly and therefore in our experiments we used 
10,000 samples at every time step, giving particle filtering 
a somewhat unfair advantage. 

Fig. 4 shows the estimates for the percentage of Ho in 
the flow at point ( 16) that were computed by our algorithm 
and by particle filtering, as well as the actual value (known 
from the simulated data). We report the results on this par- 
ticular variable because the gas compositions are not mea- 
sured by any sensors and are therefore a potential challenge 
to our algorithm. The error bars represent the uncertainty 
of the estimates as plus and minus two standard deviations 
(for particle filtering we computed the standard deviation 
induced by the weighted samples). 

Although under our setup particle filtering was slower 
than our algorithm by a factor of 750, as Fig. 4 demon- 
strates, the estimates of particle filtering are not as good as 
the estimates of our algorithm. Over the entire sequence the 
average error of our algorithm was 0.000 while the average 
error of particle filtering was 0.01 3. Nevertheless, the more 


dramatic difference is in the estimates of the variance. Of- 
ten, the estimated variance for particle filtering is extremely 
small, even when the estimated value is not very accurate 
(e.g., time steps 72 and 73). In fact, over the entire se- 
quence, according to the estimated distribution of our algo- 
rithm, the correct value of the H? composition was within 
two standard deviations 96% of the time (this is consistent 
with the fact that the probability mass within two standard 
deviations from a Gaussian mean is 95%). In comparison, 
for particle filtering, the true value was within two esti- 
mated standard deviations only 20% of the time. The dif- 
ference was even more apparent when we computed the av- 
erage log-likelihood of the true value, given the two possi- 
ble estimates. For our algorithm the average log-likelihood 
was 3.1 while for particle filtering it was only —5.50 • 1 0 1 1 . 

The reason for this problem is the relatively high dimen- 
sion of the evidence which leads to a very high variance for 
the weights of the samples. Although we generated 10,000 
samples at each time step only a very small number of them 
had a significant effect on the estimate. Over the entire se- 
quence, in 65% of the time steps one sample accounted for 
more than 0.5 of the total probability mass, in 27% one 
sample accounted for more than 0.9 of the mass, and in 
15% one sample accounted for more than 0.99. Obviously 
in cases where one sample completely dominates the rest, 
the estimates of particle filtering are not very reliable and 
in particular the variance estimates can be extremely small 
and misleading. 

Thus, not only is our algorithm faster than particle fil- 
tering with 10,000 samples by a factor of 750, its estimates 
are much more reliable. 

6.2 Results on Real Data 

We next ran a set of experiments on real data. Our data set 
consisted of a long sequence of 13,875 time steps, most of 
it collected while the system was running in steady state. 
We divided our data into a training set, used to estimate 
and tune the model parameters, and a test set on which we 
report our results. 

We conducted a variety of experiments in which we 
compared model predictions with the actual measurements 
recorded by the system under various scenarios: steady 
state and non-steady state, removing sensors, and modify- 
ing the sensor models. In order to make the comparison 
informative, the model predictions for values at time / + 1 
as reported in this section are not adjusted with evidence at 
time /-hi, i.e., they are the predictions based on evidence 
from times 0, 1, . ./. 

Our first experiment, shown in Fig. 5(a), illustrates the 
efficacy of our tracking algorithm during steady-state op- 
eration of the system. In particular, the graph illustrates 
the predicted (thick lines) and measured (thin lines) pres- 
sures, P?, and Pa at point (2) in Fig. 2(a). Observe that 
the predicted value for P?, appears to be consistently lower 
than the measurement. This is the result of the model's 
bias weighting, •) - .07, discussed in Section 4.1, which 



(a) P?/Pa (Steady-State) Pressure (b) Flow at point (1 1 ) w/ Missing Sensors (c) Ro Flow w/ Reduced Sensor Variance 



(d) R? Flow without) Bias (e) Pressure Transition (0 Flow Transition 


Figure5: Experimental Results Tracking the RWGS. The .Y axis represents time, and the Y axis the value of the appropriate 
quantity. 


tends to pull the estimates slightly away from the measured 
value. While, in this case, it produces a slightly power re- 
sult, overall, the bias weighting technique does less data 
overfitting and works better in non-steady state sequences. 

Next we experimented with “removing” sensors from 
the system. (This is easily achieved by ignoring selected 
sensor evidence when running the tracking algorithm.) 
Sensor removal can be used to evaluate the robustness of 
the algorithm as well as to determine the importance of a 
sensor for monitoring the system. In Fig. 5(b), we show 
the flow of gas from the compressor at point (11). The two 
overlaid lines are our estimates of this flow value — one 
with all of the sensors, and the other with sensors ft <» and 
R] n (located at ( 13)) removed. In contrast, when flow sen- 
sor R* (located at (16)) is removed, the predicted flow rate 
quickly strays. These results indicate that, at least for this 
sequence, /?s is a more valuable sensor than /??. and R i n. 

We also tested the effects of changing the liquid level 
(LL) sensor noise parameter 4 on our prediction of the gas 
flow ffo at (13). Recall from Section 4.1 that to correctly 
model a sensor we introduced both some Gaussian noise on 
the sensor and a hidden bias variable. We tried both a vari- 

4 The liquid level sensor is very noisy, as splashing and bub- 
bling from the dissolved CO? and from drops splashing from the 
condenser hit the sensor rod and create considerable noise in the 
sensor reading. 


ance value of 0.01 , which we estimated using “reasonable” 
prior knowledge, and a variance value of 4 which was fit to 
the data. Fig. 5(c) shows the effect of the variance of the LL 
sensor for the water tank at (5). With the fitted variance, the 
algorithm tracked quite well. In contrast, with the smaller 
variance, the performance was poor and erratic, following 
the fluctuations in the LL measurements. 

The utility of the bias variables is shown in Fig. 5(d). 
The upper line is a prediction of the flow rate, made using 
a version of the model that contained no bias variables for 
the flow sensors at (10), (13) and (16). The middle line 
corresponds to the model with the bias variables present, 
but shows the prediction for the true (unbiased) flow (i.e., 
the sensor prediction minus the bias). When we explicitly 
modeled the sensor bias, our (unbiased) predictions of the 
true system state better matched the measurements, an in- 
dication of a better estimate of the system state. 

Finally, we tested the ability of the model to track non- 
steady-state behavior — in particular, the behavior of the 
system when the CO? supply is turned off during the shut- 
down process. Unfortunately, we only had one data set con- 
taining this transition, and thus we expect our parameters 
are still not tuned optimally. In addition, having only one 
such transition in our data, we report results on the same 
data that was used for training. 

Fig. 5(e) shows a comparison between the predicted and 







measured output from pressure sensors P?> and Pa, for two 
versions of the model. The first set of predictions, shown in 
solid lines, was calculated using our best estimates of the 
empirical parameters, including the membrane area (calcu- 
lated from other parts of the data set) of 27. 1 . The second 
set of predictions, shown in dashed lines, was calculated us- 
ing an earlier estimate of the membrane area of 3 1 .6. While 
in the steady-state prior to timestep 220, the two predic- 
tions are equivalent as the differences were absorbed into 
the bias errors, in the transient part, the model with inaccu- 
rate parameters underpredicts the initial drop in pressure, 
and retains this error throughout the rest of the sequence. 

Fig. 5(f) presents the predictions of the correct model 
for the flows at ( 16) and fti? ( 10), over a longer period 
of time. Initially, when the CO? supply was cut off, the 
flows dropped; however, gradually the CO and CO? in the 
system were vented and the only remaining gas was H?. As 
the membrane presented less resistance to H? the flow rates 
started to go up. The model tracked this complex behavior 
surprisingly well. 


7 Conclusions and Future Work 

In this paper we address the problem of monitoring a large 
complex physical system — NASA’s Reverse Water Gas 
Shift system — perhaps the largest and most complex hy- 
brid DBN developed to date. This paper makes contri- 
butions both to the modeling and the monitoring of com- 
plex nonlinear systems. On the modeling side, we have 
shown how to model physical systems whose effects man- 
ifest themselves at dramatically different time scales, and 
that involve biased sensors, where the bias is state depen- 
dent and varies over time. On the monitoring side, we have 
presented a general framework for approximating nonlin- 
ear behavior using integration methods that extend the Un- 
scented Filter, improving the accuracy of the approxima- 
tion with minimal additional computation. Experimen- 
tal results indicate that this approach is much faster and 
more reliable than particle filtering. More generally, we 
have demonstrated the feasibility of hybrid DBNs for mon- 
itoring a complex real-world physical system such as the 
RWGS using real data. 

There are several interesting directions for future work. 
The tracking algorithms presented in this paper assume a 
known mode of operation, i.e., all the discrete variables are 
observed. Our long-term goal is to diagnose the RWGS 
when components fail. In order to track both the discrete 
and continuous state, we intend to combine the results pre- 
sented in this paper with algorithms that handle hidden dis- 
crete events such as Rao-Blackwellized Particle Filtering 
(RBPF) [Doucet et al, 2000] or the algorithms presented 
in [Lemer and Parr, 2001; Lemer et al, 2000]. The speed 
of our algorithm (taking just 20ms to generate a Gaussian 
over all the state variables) is a promising indication that 
we can use these techniques for real-time fault diagnosis. 
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